Goto

Collaborating Authors

 roller coaster


How to stay safe riding roller coasters

Popular Science

Millions of people ride amusement park rides every year with few serious injuries--but experts say a few precautions still matter. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Going upside down might feel scary on a roller coaster, but it's very safe. Breakthroughs, discoveries, and DIY tips sent six days a week. Summer is quickly approaching, which means more time for summer fun like checking out amusement parks.


Mini Amusement Parks (MAPs): A Testbed for Modelling Business Decisions

Aroca-Ouellette, Stéphane, Berlot-Attwell, Ian, Lymperopoulos, Panagiotis, Rajasekharan, Abhiramon, Zhu, Tongqi, Kang, Herin, Suleman, Kaheer, Pasupalak, Sam

arXiv.org Artificial Intelligence

Despite rapid progress in artificial intelligence, current systems struggle with the interconnected challenges that define real-world decision making. Practical domains, such as business management, require optimizing an open-ended and multi-faceted objective, actively learning environment dynamics from sparse experience, planning over long horizons in stochastic settings, and reasoning over spatial information. Yet existing human--AI benchmarks isolate subsets of these capabilities, limiting our ability to assess holistic decision-making competence. We introduce Mini Amusement Parks (MAPs), an amusement-park simulator designed to evaluate an agent's ability to model its environment, anticipate long-term consequences under uncertainty, and strategically operate a complex business. We provide human baselines and a comprehensive evaluation of state-of-the-art LLM agents, finding that humans outperform these systems by 6.5x on easy mode and 9.8x on medium mode. Our analysis reveals persistent weaknesses in long-horizon optimization, sample-efficient learning, spatial reasoning, and world modelling. By unifying these challenges within a single environment, MAPs offers a new foundation for benchmarking agents capable of adaptable decision making. Code: https://github.com/Skyfall-Research/MAPs


Equipping Retrieval-Augmented Large Language Models with Document Structure Awareness

Xu, Lingnan, Feng, Chong, Zhang, Kaiyuan, Zhengyong, Liu, Xu, Wenqiang, Meng, Fanqing

arXiv.org Artificial Intelligence

While large language models (LLMs) demonstrate impressive capabilities, their reliance on parametric knowledge often leads to factual inaccuracies. Retrieval-Augmented Generation (RAG) mitigates this by leveraging external documents, yet existing approaches treat retrieved passages as isolated chunks, ignoring valuable structure that is crucial for document organization. Motivated by this gap, we propose Retrieve-DocumentRoute-Read (RDR2), a novel framework that explicitly incorporates structural information throughout the RAG process. RDR2 employs an LLM-based router to dynamically navigate document structure trees, jointly evaluating content relevance and hierarchical relationships to assemble optimal evidence. Our key innovation lies in formulating document routing as a trainable task, with automatic action curation and structure-aware passage selection inspired by human reading strategies. Through comprehensive evaluation on five challenging datasets, RDR2 achieves state-of-the-art performance, demonstrating that explicit structural awareness significantly enhances RAG systems' ability to acquire and utilize knowledge, particularly in complex scenarios requiring multi-document synthesis.


Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks

Xu, Peiran, Li, Zhuohao, Xing, Xiaoying, Zhang, Guannan, Li, Debiao, Shi, Kunyu

arXiv.org Artificial Intelligence

Large Language Models (LLMs) increasingly rely on external tools such as search engines to solve complex agentic tasks that require reasoning and external knowledge retrieval. Recently, reinforcement learning with verifiable rewards (RL VR) has demonstrated its effectiveness in advancing capabilities of LLMs by rewarding the final answers via outcome rewards. While straightforward to supervise, outcome rewards only provide sparse signals and delayed feedback, which limits their effectiveness on long trajectories. Process rewards address this by evaluating intermediate steps, providing fine-grained supervision and encouraging grounded problem solving. However, it is notoriously hard to annotate step-wise labels, especially in non-verifiable process without "golden" answers. Furthermore, stepwise judgment requires the balance between local quality with contribution to the final outcome, as optimizing towards higher process reward may not always align with better final outcomes. To address the above challenges, we introduce Principle Process Reward (PPR), an RL approach that unifies principled step-level assessment and outcome verification. We train a principle-based reward model to improve the transparency and reliability of process evaluation, and further introduce a Reward Normalization (ReNorm) strategy to calibrate outcome and process rewards. Experiment results show that PPR achieves state-of-the-art performance across a wide range of benchmarks, demonstrating its impressive robustness and generalization. Our code and model collection is available in this link.Figure 1: Performance of PPR on various benchmarks with other baselines Large Language Models (LLMs) have achieved remarkable progress across a wide range of tasks, from open-domain question answering to multi-step reasoning (Guo et al., 2025; OpenAI, 2025b; Comanici et al., 2025). A key factor for success is their abilities to leverage external tools such as search engines, calculators, code interpreters, and browsers (DeepMind, 2025; Guo et al., 2024; OpenAI, 2025a). In particular, the search engine is a linchpin tool that provides verifiable and up-to-date knowledge for LLMs, helping to ground their answers and reduce hallucinations. However, training LLM agents to leverage tools effectively still remains challenging, as the complex behavior involving task decomposition, query generation, information aggregation, and stopping decisions.


MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model

Zhang, Le, Wu, Yihong, Mo, Fengran, Nie, Jian-Yun, Agrawal, Aishwarya

arXiv.org Artificial Intelligence

Multi-modal open-domain question answering typically requires evidence retrieval from databases across diverse modalities, such as images, tables, passages, etc. Even Large Language Models (LLMs) like GPT-4 fall short in this task. To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework. Using a divide-and-conquer strategy that bypasses intricate multi-modality ranking, our framework can accommodate new modalities and seamlessly transition to new models for the task. Built upon LLMs, MoqaGPT retrieves and extracts answers from each modality separately, then fuses this multi-modal information using LLMs to produce a final answer. Our methodology boosts performance on the MMCoQA dataset, improving F1 by +37.91 points and EM by +34.07 points over the supervised baseline. On the MultiModalQA dataset, MoqaGPT surpasses the zero-shot baseline, improving F1 by 9.5 points and EM by 10.1 points, and significantly closes the gap with supervised methods. Our codebase is available at https://github.com/lezhang7/MOQAGPT.


Gaming Giant Unity Wants to Digitally Clone the World

WIRED

In video games, non-playable characters can be somewhat clueless. An NPC might wander across a city block and face-plant into a streetlamp, and then maybe vanish the next block over. NPCs leap into player-characters' punches or commit to kicking a wall 400 times, never learning that the wall won't kick back. Unity Technologies is in the business of NPCs. Founded in 2004, Unity makes an eponymous game engine that provides the architecture for hundreds of video games using its real-time 3D computer graphics technology.


Leading the Robot Invasion of the Old Boys' Club

#artificialintelligence

Trailblazers Week celebrates the women who have pushed boundaries and paved the way for others in their industries. Manuela Veloso grew up in Portugal in the 1960s and '70s in a household where innovations, from the moon landing to the building of a huge bridge in Lisbon, were the subject of dinner-table discussion. In 1994, she moved to the U.S. to earn a master's degree in computer science, and she went on to get her Ph.D. at Carnegie Mellon. It was the golden era of artificial intelligence, the "years of deep thoughts, chess playing, hopping robots," she tells the Cut. Veloso spent more than two decades at the university, working her way up to become the head of its machine-learning department, and has been researching artificial intelligence ever since -- now as head of AI research at JPMorgan and professor emeritus at Carnegie Mellon.


Poker-Faced Trading: Will This Theory Change Your Strategy?

#artificialintelligence

Sure, you've heard of game theory. And sure, trading is like a game: you devise a strategy, learn the rules, and try to beat everyone else to the punch, finding trends before anyone else does. You've even heard that letting your emotions get the best of you is a terrible trading strategy. To start to undermine our own self-destructive habits, it helps to understand how the game is geared toward those who figure out the game behind the game: how to play to win by controlling the emotions that lead us to bad decisions. Game theory can be applied to human trading because the object of a trade is to "win" a profit.


The Top Data Science Courses at Udemy

#artificialintelligence

There's no doubt about it - Data Science is big news right now. We see it on the news every day, the increasing number of news stories about Big Data, the Internet of Things, Deep Learning, Artificial Intelligence, smart cars, smart cities, smart politicians. OK, maybe I went a bit too far with that last one... There's also a great appetite for learning about Data Science too. Every month I get an email from Udemy telling me which courses are their best sellers. The list isn't about Data Science, but there are always plenty of Data Science courses right up there at the top of the list.


Gifts for video game lovers

Los Angeles Times

Clockwise, from top left: Nintendo; Lenovo; Build-A-Bear's Pokémon collection; "Beasts of Balance" Clockwise, from top left: Nintendo; Lenovo; Build-A-Bear's Pokémon collection; "Beasts of Balance" (Nintendo; Lenovo; Build-A-Bear; Beasts of Balance) Though this itty-bitty retro gaming console may be relatively hard to find, it's worth the hunt, as it represents some of Nintendo's best work. "Super Mario World" expanded the universe for our favorite plumber, and games like "Earthbound" brought serious topics to the home console. Scott Pilgrim's "Precious Little Card Game" pits you -- and up to three of your friends -- against one of the dreaded evil exes from Bryan Lee O'Malley's "Scott Pilgrim" series. But don't worry -- damage will be dealt, but no hearts will be broken. "Super Mario Odyssey," Mario's first-adventure for the Nintendo Switch, looks to be one of the oddest in the series yet -- and remember, this is a game with walking mushrooms.